Co-Reference Resolution for the Indonesian Language Using Association Rules
نویسندگان
چکیده
Abstract In this paper, we proposed a co-reference resolution method for texts in the Indonesian language. The objective of co-reference resolution is to identify equivalence between entities as well as between pronouns and entities that were recognized in a named entity recognition phase. We propose a method that uses association rules. The method combines several features, such as pronoun and name classes, string similarity and position in the text, into a vector of attributes. Applied to a corpus of newspaper articles in the Indonesian language, the method yields an FMeasure of 84.12%. we compare the result to one of state-of-the-art machine learning method for co-reference resolution, decision tree, and the result is comparable.
منابع مشابه
Application of association rules mining to Named Entity Recognition and co-reference resolution for the Indonesian language
In this paper, we propose a new method, association rules mining for Named Entity Recognition (NER) and co-reference resolution. The method uses several morphological and lexical features such as Pronoun Class (PC) and Name Class (NC), String Similarity (SP) and Position (P) in the text, into a vector of attributes. Applied to a corpus of newspaper in the Indonesian language, the method outperf...
متن کاملNaturalization in Translation:A Case Study on the Translation of English-Indonesian Medical Terms
Naturalization is a translation procedure that is predominantly utilized in the translation of English medical terms into Indonesian. This study focuses on identifying types of naturalization involving the adjustment of spelling and pronunciation and investigating whether naturalization has been appropriately applied based on the rules in the Indonesian general guidance of term formation. The d...
متن کاملNaturalization in Translation:A Case Study on the Translation of English-Indonesian Medical Terms
Naturalization is a translation procedure that is predominantly utilized in the translation of English medical terms into Indonesian. This study focuses on identifying types of naturalization involving the adjustment of spelling and pronunciation and investigating whether naturalization has been appropriately applied based on the rules in the Indonesian general guidance of term formation. The d...
متن کاملA rule based solution to co-reference resolution in clinical text
OBJECTIVE To build an effective co-reference resolution system tailored to the biomedical domain. METHODS Experimental materials used in this study were provided by the 2011 i2b2 Natural Language Processing Challenge. The 2011 i2b2 challenge involves co-reference resolution in medical documents. Concept mentions have been annotated in clinical texts, and the mentions that co-refer in each doc...
متن کاملA Two-Level Morphological Analyser for the Indonesian Language
This paper presents our efforts at developing an Indonesian morphological analyser that provides a detailed analysis of the rich affixation process. We model Indonesian morphology using a two-level morphology approach, decomposing the process into a set of morphotactic and morphophonemic rules. These rules are modelled as a network of finite state transducers and implemented using xfst and lexc...
متن کامل